The aim of this notebook is to cluster the samples.
I sequenced these samples with Prime-seq on extracted RNAs already. Results were not that great. Here reprocess the same samples with TIRE-seq for a head to head comparison.
The processing went as planned. Full writeup available at ELN link
This was generated in 1A_generateSCE-human
sce <- readRDS(here::here(
"data/TIRE_Tcell/SCEs/tcell_basic.sce.rds"
))
sce_orig <- sce
tb <- as_tibble(colData(sce))This is explored more fully in notebook 1B. Here I recap only the
important point.
Row H was largely a failure. This is donor 64, CD8 T cells.
plt1 <- ggplot(tb,
aes(x = Timepoint, y= sum, colour = Donor, label=Well)) +
geom_text() +
ylab("UMIs") +
xlab("Plate") +
scale_y_continuous(trans='log10') +
annotation_logticks(base = 10, sides = "l") +
scale_colour_brewer(palette = "Dark2")
plt1Remove outliers. 13 samples are removed.
I won’t use mitochondrial genes as a criteria here because the general
increase in transcription upon stimulation.
## low_lib_size low_n_features discard
## 11 13 13
sce <- sce_orig[,!discarded$discard]
discarded <- as_tibble(discarded)
# Summarise how many cells left
cell_drop_tb <- rbind(
cbind(length(colnames(sce_orig)), length(colnames(sce)))
)
colnames(cell_drop_tb) <- c("Before_Filter", "After_filter")
cell_drop_tb## Before_Filter After_filter
## [1,] 96 83
tb <- as_tibble(colData(sce))
plt8 <- ggplot(data=tb, aes(y=sum+1, x=detected, colour=Timepoint, label=Well)) +
geom_text(size=3) +
scale_colour_brewer(type="qualitative", palette = "Dark2") +
xlab("Genes") + ylab("UMIs") +
scale_y_continuous(trans='log10') +
annotation_logticks(base = 10, sides = "l")
plt8Library size normalization and transformation
Select the top 1000 most variable genes.
dec.sce <- modelGeneVar(sce)
fit.sce <- metadata(dec.sce)
hvg.sce.var <- getTopHVGs(dec.sce, n=1000)
sce <- runPCA(sce, subset_row=hvg.sce.var, ncomponents=30)Visulaise the fit
Technical variation is nice and low. Some clear genes above technical variation
plt2 <- ggplot(tb,
aes(x = Mean, y= Variance)) +
geom_point(alpha = 0.2, size=0.5) +
guides(colour = guide_legend(override.aes = list(size=2, alpha=1))) +
xlab("Mean of log-expression") +
ylab("Variance of log-expression") +
scale_colour_brewer(type="qualitative", palette = "Dark2") +
geom_function(fun = fit.sce$trend, colour = "darkgreen") +
theme_Publication(base_size = 16)
plt2The coloured line represents technical variation
Principal component analysis is a linear dimension reduction so the
distances are interpretable.
It preserves local differences so not as good at visualising non-linear
differences.
A PCA 1 and 2 of 38% and 16% is good for RNA-seq data.
# Convert to ordered factor with numeric day values
sce$Timepoint <- factor(sce$Timepoint,
levels = str_sort(unique(sce$Timepoint), numeric = TRUE),
ordered = TRUE)A clear difference in CD4 vs CD8 that is greater than the donor effect.
plt5 <- plotPCA(sce, colour_by="Timepoint", shape_by="Subset", point_size=2) +
guides(colour=guide_legend(ncol=2)) +
theme_Publication(base_size=16)
plt5plt4 <- plotPCA(sce, colour_by="Timepoint", shape_by="Donor", point_size=2) +
guides(colour=guide_legend(ncol=2)) +
theme_Publication(base_size=16)
plt4UMAP is a non-linear dimension reduction and emphasises global structure. The distances between groups don’t mean anything. You can say something about how tightly a group packs into a cluster.
Very curious that day 1 and day 2 has the most dramatic differences and day 0 does not.
In the PCA plots day 0 is more distinct.
plt10 <- plotReducedDim(sce, dimred="UMAP", colour_by="Timepoint",shape_by="Subset", point_size=2) +
guides(colour=guide_legend(ncol=2)) +
theme_Publication(base_size = 16)
plt10Differential expression testing.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Red Hat Enterprise Linux 9.4 (Plow)
##
## Matrix products: default
## BLAS: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRblas.so
## LAPACK: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggthemes_5.1.0 here_1.0.1
## [3] knitr_1.48 patchwork_1.3.0
## [5] platetools_0.1.7 scater_1.32.1
## [7] scran_1.32.0 scuttle_1.14.0
## [9] lubridate_1.9.3 forcats_1.0.0
## [11] stringr_1.5.1 dplyr_1.1.4
## [13] purrr_1.0.2 readr_2.1.5
## [15] tidyr_1.3.1 tibble_3.2.1
## [17] ggplot2_3.5.1 tidyverse_2.0.0
## [19] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
## [21] Biobase_2.64.0 GenomicRanges_1.56.2
## [23] GenomeInfoDb_1.40.1 IRanges_2.38.1
## [25] S4Vectors_0.42.1 BiocGenerics_0.50.0
## [27] MatrixGenerics_1.16.0 matrixStats_1.4.1
##
## loaded via a namespace (and not attached):
## [1] gridExtra_2.3 rlang_1.1.4
## [3] magrittr_2.0.3 compiler_4.4.1
## [5] DelayedMatrixStats_1.26.0 vctrs_0.6.5
## [7] pkgconfig_2.0.3 crayon_1.5.3
## [9] fastmap_1.2.0 XVector_0.44.0
## [11] labeling_0.4.3 utf8_1.2.4
## [13] rmarkdown_2.28 tzdb_0.4.0
## [15] UCSC.utils_1.0.0 ggbeeswarm_0.7.2
## [17] xfun_0.48 bluster_1.14.0
## [19] zlibbioc_1.50.0 cachem_1.1.0
## [21] beachmat_2.20.0 jsonlite_1.8.9
## [23] highr_0.11 DelayedArray_0.30.1
## [25] BiocParallel_1.38.0 irlba_2.3.5.1
## [27] parallel_4.4.1 cluster_2.1.6
## [29] R6_2.5.1 RColorBrewer_1.1-3
## [31] bslib_0.8.0 stringi_1.8.4
## [33] limma_3.60.6 jquerylib_0.1.4
## [35] Rcpp_1.0.13 FNN_1.1.4.1
## [37] Matrix_1.7-0 igraph_2.0.3
## [39] timechange_0.3.0 tidyselect_1.2.1
## [41] rstudioapi_0.17.0 abind_1.4-8
## [43] yaml_2.3.10 viridis_0.6.5
## [45] codetools_0.2-20 lattice_0.22-6
## [47] withr_3.0.1 evaluate_1.0.1
## [49] pillar_1.9.0 generics_0.1.3
## [51] rprojroot_2.0.4 hms_1.1.3
## [53] sparseMatrixStats_1.16.0 munsell_0.5.1
## [55] scales_1.3.0 glue_1.8.0
## [57] metapod_1.12.0 tools_4.4.1
## [59] BiocNeighbors_1.22.0 ScaledMatrix_1.12.0
## [61] locfit_1.5-9.10 cowplot_1.1.3
## [63] edgeR_4.2.2 colorspace_2.1-1
## [65] GenomeInfoDbData_1.2.12 beeswarm_0.4.0
## [67] BiocSingular_1.20.0 vipor_0.4.7
## [69] cli_3.6.3 rsvd_1.0.5
## [71] fansi_1.0.6 S4Arrays_1.4.1
## [73] viridisLite_0.4.2 uwot_0.2.2
## [75] gtable_0.3.5 sass_0.4.9
## [77] digest_0.6.37 SparseArray_1.4.8
## [79] ggrepel_0.9.6 dqrng_0.4.1
## [81] farver_2.1.2 htmltools_0.5.8.1
## [83] lifecycle_1.0.4 httr_1.4.7
## [85] statmod_1.5.0